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Abstract — In this paper, spectrum access in cognitive radio 
networks is modeled as a repeated auction game subject to 
monitoring and entry costs. For secondary users, sensing costs 
are incurred as the result of primary users' activity. Further- 
more, each secondary user pays the cost of transmissions upon 
successful bidding for a channel. Knowledge regarding other 
secondary users' activity is limited due to the distributed nature 
of the network. The resulting formulation is thus a dynamic game 
with incomplete information. In this paper, an efficient bidding 
learning algorithm is proposed based on the outcome of past 
transactions. As demonstrated through extensive simulations, the 
proposed distributed scheme outperforms a myopic one-stage 
algorithm, and can achieve a good balance between efficiency 
and fairness. 



I. Introduction 

Recent studies have shown that despite claims of spectral 
scarcity, the actual licensed spectrum remains unoccupied for 
long periods of time [1]. Thus, cognitive radio (CR) systems 
have been proposed [2] in order to efficiently exploit these 
spectral holes. CRs or secondary users (SUs) are wireless 
devices that can intelligently monitor and adapt to their 
environment, hence, they are able to share the spectrum with 
the licensed primary users (PUs), operating whenever the 
PUs are idle. Three key design challenges are active topics 
of research in cognitive radio networks, namely, distributed 
implementation, spectral efficiency, and the tradeoff between 
sensing and spectrum access. 

Previous studies have tackled various aspects of spectrum 
sensing and spectrum access. In [3], the performance of 
spectrum sensing, in terms of throughput, is investigated when 
the SUs share their instantaneous knowledge of the channel. 
The work in [4] studies the performance of different detectors 
for spectrum sensing, while in [5] spatial diversity methods are 
proposed for improving the probability of detecting the PU by 
the SUs. Other aspects of spectrum sensing are discussed in 
[6] and [7]. Furthermore, spectrum access has also received 
increased attention, e.g., [8], [9], [10], [11], [12]. In [8], 
a dynamic programming approach is proposed to allow the 
SUs to maximize their channel access time while taking into 
account a penalty factor from any collision with the PU. The 
work in [8] (and the references therein) establish that, in 
practice, the sensing time of CR networks is large and affects 
the access performance of the SUs. In [9], the authors model 
the spectrum access problem as a non-cooperative game, and 
propose learning algorithms to find the correlated equilibria 



of the game. Non-cooperative solutions for dynamic spectrum 
access are also proposed in [10] while taking into account 
changes in the SUs' environment such as the arrival of new 
PUs, among others. 

When multiple SUs compete for spectral opportunities, the 
issues of fairness and efficiency arise. On one hand, it is desir- 
able for an SU to access a channel with high availability. On 
the other hand, the effective achievable rate of an SU decreases 
when contending with many SUs over the most available 
channel. Consequently, efficiency of spectrum utilization in the 
system reduces. Therefore, an SU should explore transmission 
opportunities in other channels if available and refrain from 
transmission in the same channel all the time. Intuitively, 
diversifying spectrum access in both frequency (exploring 
more channels) and time (refraining from continuous trans- 
mission attempts) would be beneficial to achieving fairness 
among multiple SUs, in that SUs experiencing poorer channel 
conditions are not starved in the long run. 

The objective of this paper is to design a mechanism that 
enables fair and efficient sharing of spectral resources among 
SUs. We model spectrum access in cognitive radio networks 
as a repeated auction game with entry and monitoring costs. 
Auctioning the spectral opportunities is carried out repeatedly. 
At the beginning of each period, each SU that wishes to 
participate in the spectrum access submits a bid to a coor- 
dinator based on its view of the channel and past auction 
history. Knowledge regarding other secondary users' activities 
is limited due to the distributed nature of the network. The 
resulting formulation is thus a dynamic game with incomplete 
information. The bidder with the highest bid gains spectrum 
access. Entry fees are charged for all bidders who participate 
in the auction irrespective of the outcome of the auction. An 
SU can also choose to stay out (SO) of the current round, in 
which case no entry fee is incurred. At the end of each auction 
period, information regarding bidding and allocation are made 
available to all SUs, and in turn a monitoring fee is incurred. 

To achieve efficient bidding, a learning algorithm is pro- 
posed based on the outcome of past transactions. Each SU 
decides on local actions with the objective of increasing its 
long-term cost effectiveness. As demonstrated through exten- 
sive simulations, the proposed distributed scheme outperforms 
a myopic one-stage algorithm where an SU always participates 
in the spectrum access game in both single channel and multi- 
channel networks. 

A comment is in order on the feasibility of such an auction- 
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based approach to spectrum access in practice. Due to com- 
mercial and industrial exploitation and different stake holders' 
interests, the functional architectures and cognitive signaling 
schemes are currently under discussion within standardization 
forums, including IEEE SCC 41 and ETSI TC RRS (Recon- 
figurable Radio Systems). Cognitive pilot channel (CPC) has 
gained attention as a potential enabler of data-aided mitigation 
techniques between secondary and primary communication 
systems as well as a mechanism to support optimized radio 
resource and data management across heterogeneous networks. 
In CPC, a common control channel is used to provide the 
information corresponding to the operators. Radio Access 
Technology and frequencies allocated in a given area. We can 
thus leverage the intelligence of the CPC coordinator and the 
control channel to solicit bidding and broadcast the outcome 
of auctions. 

The main contributions of this paper are: 

1) We have formulated the spectrum access problem in 
cognitive radio networks as a repeated auction game. 

2) A distributed learning algorithm is proposed for single- 
channel networks, and a non-regret learning algorithm 
is investigated for multi-channel networks. 

The rest of the paper is organized as follows. In Section HIl 
the system model and terminology are introduced. Mechanism 
design of the repeated auction with learning is presented in 
Section|lII] Simulation results are given in Section HVl followed 
by conclusions and a discussion of future work in Section |V] 

II. Physical layer and System Model 

We consider a cognitive radio network consisting of K 
channels to be occupied by N SUs who compete repeatedly 
for access to the channels at each discrete time t. At time t, the 
i*^ SU can reasonably estimate its channel rate 6\ in the fc*'* 
channel while having no knowledge of that of other SUs. In 
other words, each SU has imperfect information. We assume 
that both N and K are known to all SUs. The primary user's 
activity follows Bernoulli distribution, i.e., at time t, the fcth 
channel is occupied with probability 6^ at time t. Without 
loss of generality, all secondary users use a common transmit 
power Pq with a thermal noise level at the basestation. 
The channel gain for the i*'' secondary user is assume to be 
Gihl where Gi is the propagation path loss and h* ^ follows 
the Rayleigh fading distribution. The rate for the i*'* user on 
the fc*'' channel at time t can be written as 

eU = w\ogAi + ^^\, (1) 

where W is the bandwidth for each channel. All channels are 
assumed to have the same bandwidth for ease of exposition. 

At time t, an SU may incur two types of costs, namely, the 
cost of accessing ct, which accounts for the energy expenditure 
needed for spectrum access; and the cost of monitoring et, 
which is the cost of sensing and subscribing to the control 
channel (e.g., CPC) to obtain information regarding past 
auctions. The spectrum access among SUs is modeled as a 
repeated auction. The access cost, also called the entry free 
is charged only when the user decides to participate in the 




Fig. L Illustration of the System Model 

auction at time t. On the other hand, SUs always need to 
pay for the monitoring cost regardless of their decisions. At 
the beginning of a slot t, an SU chooses whether to stay out 
(SO) or participate in spectrum access. If the latter option 
is chosen, the SU (or bidder) sends a confidential message 
to the coordinator (or auctioneer) containing its bid. Let the 
bid submitted by SU i be m*, which is a if x 1 vector with 
component m\ ^, for the fc*'* channel. We define the set of 
actions of user i's opponents as 

mU = {ni]\3^N\i}. (2) 

The cost incurred is thus 

et + Ctl{mlk^SO), 

where !(•) is the indicator function. 

In each round, the bidder with the maximum bid wins 
and is granted the spectrum access opportunity. A payment is 
incurred accordingly. There are two key differences compared 
to existing spectrum access models, where upon sensing an 
idle channel, all SUs contend for spectrum access. First, an 
SU may choose to stay out if participating in the spectrum 
access game is deemed unfavorable (because of low data rate 
or large number of contending SUs). Second, the transmission 
opportunity in each available channel is granted only to a 
single SU after the auction. Therefore, no further contention 
will occur Each SU is assumed to follow a symmetric strategy 
based on its local state and information learned. 

Figure [T] gives an illustration of the system model. An 
analogy can be drawn in casinos, in which different gamblers 
try to select which table to play and how much to bet. Each 
secondary user shall decide which channel to sense and bid 
for, and how much the bid should be. These two issues will 
be addressed in the following sections for single-channel (i.e., 
K = 1) and multi-channel (i.e., K > 1) cognitive radio 
networks, respectively. 

III. Repeated Auction with Learning 

In this section, we investigate the spectrum access problem 
among multiple SUs in cognitive radio networks. We first 
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discuss the auction mechanism and then define the utiUty 
function. Finally, an efficient bidding mechanism in repeated 
auctions with learning is proposed. 

A. Mechanism 

Recall from Section HI] that at the beginning of a slot t, an 
SU decides to either place a bid, or stay out and monitor the 
results. Based on the SUs' actions, the allocation strategy at 
time t for channel fc can be written as 

Xt{k) = {xU, • ■ • , xi,k\x^,k e {0, 1} A ^ xU = 1}' Vfc- 

i 

If SU i chooses to stay out, then its allocation equals zero, 
i.e., 

xUKfc SO) = 0. (3) 

The SU with the highest bid would win the right to access the 
channel, i.e., 

f 1, m* J. > rr* 



1, m 

and primary user does not exist 
0, Otherwise. 

(4) 

The winner would pay 

0, m* J. = SO or 3j, m* ^ • 
and primary user exists 



Pi,k 



■ ^ , Otherwise 



for its bid, where 



First Price Auction; 



(5) 



(6) 



max(in^j j,), Second Price Auction. 

For ease of presentation, a second price auction is assumed 
in the remaining discussion of the paper. The auction mecha- 
nism can be written as follows: 

1) SU i observes its current valuation (rate) 6** j^; 



2) SU i decides m* 



i,k'' 



3) The mechanism implements xl k rik'^ ^^'^ 

4) SU i observes x* ^ and ^. 

Mechanisms and results for "one-shot" auction with and 
without entry fees have been well established in the liter- 
ature [20]. Typically, a symmetric and known strategy is 
assumed. In our formulation, at each stage of the auction, an 
SU decides on its action according to the bidding history mon- 
itored thus far The number of participants varies from stage 
to stage depending on the SU's valuation and its knowledge 
regarding other players. 

B. Utility Function 

To assess the expenditure in the course of the game, we 
define the accumulated cost for SU i at time t as 



K t-1 



E E K.'^^ife + ^ 0) + e*] . (7) 



fc=l r=l 



where h\~^ is the bidding history observed by SU i up to 
time t. The cost includes payment for the spectrum access 
opportunity, entry and monitoring fees over the history and 
across the different channels. 



The accumulated reward of SU i is given by 

K i-1 

k=l T=l 

The utility is thus defined as the accumulated reward over 
the total cost, i.e.. 



(9) 



The utility function is essentially the revenue to cost ratio 
of the SU's actions over time. An SU will try to maximize its 
utility. Intuitively, when an SU's valuation is low compared 
with others, it is beneficial for the SU to stay out so that 
the entry cost is not incurred unnecessarily. On the other 
hand, staying out all the time leads to zero accumulation of 
revenue and starvation of the SU, and thus should be avoided. 
Optimizing (|9]) is difficult even in a centralized manner due 
to the large decision space. Therefore, distributed heuristic 
learning algorithms are warranted to determine to* ^ at each 
SU individually. 

At time t, an SU decides whether to participate in the 
auction and if so, its bid. If SU i's decision is to participate 
(or ra\ j. 7^ SO), it can be proved straightforwardly that SU i's 
dominating strategy is to bid its own valuation in the second 
price auction. More formally, we have 

Proposition 1: The equilibrium of the repeated auction with 
utiUty function (|9]l consists of each bidder using the following 
strategy at time t: 



H.k 



SO f(9l,,Kt-^)>o 

else, 



at 
^i,k 



(10) 



) is a function of SU i's current valuation 



where 

and bidding history. The above strategy implies a thresh- 
olding criterion for participating in the game. The form of 
/(6'* ^, differs in the single channel and multi-channel 
scenarios, and will be discussed in more detail in subsequent 
sections. 

C. Repeated Auction in a Single Channel 

When there is only a single channel, we can drop k in 
the notation. An SU stays out of bidding if it deems that 
participation is likely to reduce its payoff. Formally, m* ~ SO, 
if 



7, 



t+ii 



i^-' : TO* = SO) > E,. ^ irf^'ielht' : ml = 0*)) 

(11) 

In (fTTI) . the expectation is taken over all possible valuations 
of SU z's opponents. 

In the first auction, no past history is available. The same 
thresholding function is applied at each SU under the assump- 
tion that the valuations of SUs are independent and identically 
distributed with cumulative distribution function (CDF) F{-) 
and probability density function (PDF) /(•). Therefore, the 
CDF and PDF of the second largest valuation among users 
are G(z/) = F{y)^-^ and g{y) = {N - l)F{y)«~^ f{y), 
respectively. Let r] — c\ — 1, for all i. The strategy for the 
first auction is stated as follows. 
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Proposition 2: In the first auction, m\ — SO if and only if 
')} < c, where 

eG(e) = 

1 + Cl 



Proof: Since c is the lowest valuation of any SU to 
participate in the auction, only when all other SUs have a 
valuation less than 9 will SU i with valuation 9 win the 
auction. Therefore, 

1 



1 + ei' 



and 



To satisfy ( fTTT i. we have 



6iG(6i) 



ei 



1 + ci 



Direct evaluation of (fTTT i is difficult after the first auction. 
This is because the accumulated reward, cost and current 
valuation are only available to each SU individually (although 
the auctioneer provides the information regarding the highest 
bid and associated payment at the end of each stage). Next, 
we introduce a simple heuristic to approximate the right hand 
side of (fTTl l. SU i maintains a private threshold value 9i, 
initiated according to Proposition ID At time t, SU i updates 



■■ m\ = SO) 



c'+et 



Furthermore, 



c, + et + ct 



SU z's action is thus. 




> 



c*+et c\+et+ct 

else 



(12) 



At the end of stage t, the SU obtains the largest bid and 
associated payment. If SU i chooses to stay out, but the 
payment of the winner is less than 9i, its Oi is set to the 
payment amount. Otherwise, 6i remains the same. On the other 
hand, if SU i participates in the auction but either loses the 
auction or is required to make a payment higher than 9i, its 
9i is set to the payment amount. To avoid fluctuation of 9i 
estimates, a moving average of old and new values can be 
applied. 

The above mechanism is summarized in Algorithm [T] 



input : Number of SU's n, monitoring et and entry fee 

Ct at time t 
init : Set ^ = ^ s.t., 9G(9) = 

a^-ll{9,^t_m\^ SO); 

^ \ i 

cl+et+ct 

if a > b or PU detected then 

rrii^t = SO; 

end 
else 

= 6*1, t 

end 

Let the maximum payment at stage t be Pm{t); 
if rrii^t = SO and Pm{t) < Oi then 
9i = Pm{t) * a + 6i* {I - a) 

end 

if rn^t ^ SO and (xi_£ = or Pm{t) > 9i) then 
6t = P,n{t) * a + 9, * {1 - a) 

end 



Algorithm 1: Strategy in single-channel spectrum access 



where 



Dl{mlrhl) = 



■ E 

t-U<T<t 



(r*(m*,m*_,;)-r*(m*,m*_J), 



(14) 

withz/ denoting the size of time window. _D|(ni*,m*) has the 
interpretation of average payoff that SU i would have obtained, 
if it had bid m* every time in the past instead of choosing 
m*. The expression (m*, m*) can be viewed as a measure 
of the average regret. In the context of spectrum access in 
multi-channel networks, the alternative actions correspond to 
participating in the auction game in different channel^ The 
probability P* for SU i to join auctions in channel fc is a 
linear function of the regret. Here is a K-by-1 probability 
vector with Y^^^i^lk ~ 1- Define l(ni*) as the indication 
vector for whether or not SU i competes in the fcth channel. 
The detailed regret-matching algorithm is given in Algorithm 
12] The complexity of the algorithm is 0{K) and can be 
implemented distributively. Furthermore, its convergence has 
been proved in the literature [19]. Once SU i chooses the 
channel to access, its action is decided by Algorithm [T] Note 
that even though an SU can access only one channel at a 
time, the bidding history on all data channels is made available 
through the control channel to all SUs. 



D. Non-Regret Algorithm for the Multi-channel Case 

In this section, we will address the spectrum access problem 
in multi-channel networks. A class of algorithms called regret- 
matching [19] is explored. The resulting stationary solution 
of the learning algorithm exhibits no regret by setting the 
probability of a particular action proportional to the "regrets" 
for not having played other actions. In particular, for any two 
distinct actions m* ^ m* at every time t, the regret of SU i 
at time t for not playing m* is 

M*(m*,m*) max{ A*(m*, m*), 0}, (13) 



IV. Simulation 

In this section, we investigate the performance of the 
proposed schemes by simulations. We construct a network of 
dimensions lOOm-by-lOOm, in which the SUs are randomly 
placed. All SUs transmit to a base station at a fixed location 
1000m away from the center of the network. The propagation 
loss exponent is set to be 3. The common transmission power 
level of all SUs is set to be lOOmW and the noise level at 

'Each user decides which channel to bid on, and then uses the value 6* j, 
for bidding that channel. 
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init : The probability of SU i, P] is set arbitrarily 
foreach t = 1, 2, 3, ... do 

Find £>*(m*,rn*) as in (fT4l l: 

Find average regret R*(in*, m*) as in ( fT3] l; 

P*+i(m*) = [l - J:^^^ Pr'(mD] IK), where 
K is a certain constant that is sufficiently large; 

end 



Algorithm 2: Non-regret learning algorithm for the multi- 
channel case 
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Fig. 2. Convergence of repeated auction games under the propo.sed and 
myopic schemes. 2 SUs, 1 charmel 



-90dbm. A unit bandwidth is assumed with frame length at 
100/J,s, and Doppler frequency lOOHz. We set a — 0.05 and 
v = 10. The proposed schemes are compared to a myopic 
scheme, in which SUs always participate in the auctions. 

Single channel: First, we consider a simple 2-user case to 
understand the convergence of the proposed algorithm. Figure 
|2] shows a snapshot of the change of the utility function 7^ 4 
for user i over time. The entry and monitoring fees are fixed 
at Ct = 10 and et = 1, respectively. From the figure, we 
can see that the proposed scheme converges quickly and then 
tracks the changes in the channel. Furthermore, compared to 
the myopic scheme in which the SUs always bid, the utilities 
attained are higher for both users. This is because the SUs 
can decide whether to bid or not based on its valuation and 
outcome of past auctions. Between time 4000 and 6000, a 
primary user is active, and all SUs stay out of the auction 
but still pay for the monitoring fee. The average value of 
gamma decreases during that period of time. After the primary 
user stops transmitting, the auction game resumes. Since the 
effects of a primary user are very predictable, in the remaining 
simulations we assume the primary user is always idle. 

In Figure [3] we demonstrate the effects of entry and moni- 
toring costs on performance. The proposed scheme is shown to 
achieve better performance in all cases and the average gain is 
up to 15%. We can see that when the monitoring fee is fixed, 
as the entry fee (ct) increases, the average utility decreases. 
This is expected as it is more expensive to participate in the 
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Fig. 3. Effects of monitoring and entry costs. 2 SUs 
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Fig. 4. Fairness achieved by SUs 



auction. Furthermore, the gap between the utiUties attained by 
the proposed scheme and the myopic scheme also increases. 
This is because in the proposed scheme SUs are selective and 
would participate in the auction only if they are likely to win. 
The myopic scheme would incur high losses in revenue as the 
result of a higher entry cost. 

In Figure 21 we show the fairness achieved in the proposed 
and myopic schemes. We adopt Jain's fairness index [21]: 



F = 



Clearly, F is between and 1. The larger the value, the better 
the fairness is. We can see that the proposed scheme results in 
fairer resource allocation compared with the myopic bidding 
scheme. As the entry cost increases, the fairness of the myopic 
scheme also decreases. This is because users experiencing 
worse channels are repeatedly penalized by losing the game 
and paying entry fees. In comparison, when e = 10 and e = 5, 
the proposed scheme achieves slightly better fairness as the 
entry cost c increases. 

In the next set of experiments, we set the number of SUs to 
be iV = 16. Figure|5](cf — Ct — compares the SUs' average 
valuation R, average bid b and instantaneous bid at time 5000, 
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Fig. 5. Valuation and bids of SUs 
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Fig. 7. Fairness achieved as the number of SUs changes 

Best Channei Bidding (BOB), Geni Aided (GA), and Non-Regret Learning (NRL) 
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Fig. 8. Utility in two-SU two-channel networks under different algorithms 
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Fig. 6. Utility achieved as the number of SUs changes 

respectively. Several observations are in order. First, the users 
with a higher average value generally agree with users with 
a higher average bid though not always. This is because the 
average bid also includes the case in which an SU stays out 
(treated as a zero bid). Second, as expected, not all the users 
are bidding in each slot; only the users with low cost and high 
chance of winning would participate. 

In Figure |6] and Figure |7] we show the utility and fairness 
as a function of the number of SUs varying from 2 to 16. 
The costs are set as Cf — 10 and et = 1, respectively. We can 
see that as the number of SUs increases, the utility decreases 
due to limited radio resources. The fairness also decreases 
since there might be more chances for users to dominate when 
the number of users is large. The proposed scheme has better 
performance in both utility and fairness, compared with the 
myopic scheme. The gain in utility ranges from 12% to 25%. 

Multiple channels: In this set of experiments, we study 
the performance and convergence of a two-user two-channel 



case. The parameters are set as = 5, e = 5, and u = 10. 
Three schemes are compared, namely. Best Channel Bidding 
(BCB), Geni Aided (GA), and Non-Regret Leai-ning (NRL). 
In BCB, the users select to bid on the channel with the highest 
channel gain. In GA, a Geni tells the SUs not to bid on the 
channel that they would not win and instead to bid on the 
other channels. The GA solution is thus the performance upper 
bound for practical systems. We can see that the BCB has 
the worst performance since the SUs might bid on the same 
channel while the other channels are vacant. The proposed 
NRL solution on the other hand, performs closely to the 
GA solution, and can be easily implemented in a distributed 
manner. 

V. Conclusions 

In this paper, we have investigated the problem of spectrum 
access in single and multi-channel cognitive radio networks. 
A repeated auction based framework has been adopted. In 
single-channel spectrum access, SUs selectively participate in 
the auction based on their valuation and past auction history. 
This scheme has been shown to outperform a myopic scheme 
in which SUs always compete. In multi-channel networks, a 
non-regret approach has been proposed. Its performance has 
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been shown to be significantly better than a naive greedy 
solution and come close to that of the Geni aided solution. 
As future work, we plan to improve on the convergence speed 
and optimality of the proposed scheme. Also of interest is 
the study of robust mechanisms for situations in which the 
monitored information may be inaccurate. 
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